Inner-Outer Bracket Models for Word Alignment using Hidden Blocks
نویسندگان
چکیده
Most statistical translation systems are based on phrase translation pairs, or “blocks”, which are obtained mainly from word alignment. We use blocks to infer better word alignment and improved word alignment which, in turn, leads to better inference of blocks. We propose two new probabilistic models based on the innerouter segmentations and use EM algorithms for estimating the models’ parameters. The first model recovers IBM Model-1 as a special case. Both models outperform bidirectional IBM Model-4 in terms of word alignment accuracy by 10% absolute on the F-measure. Using blocks obtained from the models in actual translation systems yields statistically significant improvements in Chinese-English SMT evaluation.
منابع مشابه
A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC
A word in one language can be translated to zero, one, or several words in other languages. Using word fertility features has been shown to be useful in building word alignment models for statistical machine translation. We built a fertility hidden Markov model by adding fertility to the hidden Markov model. This model not only achieves lower alignment error rate than the hidden Markov model, b...
متن کاملHidden Markov Tree Model for Word Alignment
We propose a novel unsupervised word alignment model based on the Hidden Markov Tree (HMT) model. Our model assumes that the alignment variables have a tree structure which is isomorphic to the target dependency tree and models the distortion probability based on the source dependency tree, thereby incorporating the syntactic structure from both sides of the parallel sentences. In English-Japan...
متن کاملUnsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...
متن کاملA Projection Extension Algorithm for Statistical Machine Translation
In this paper, we describe a phrase-based unigram model for statistical machine translation that uses a much simpler set of model parameters than similar phrasebased models. The units of translation are blocks – pairs of phrases. During decoding, we use a block unigram model and a word-based trigram language model. During training, the blocks are learned from source interval projections using a...
متن کاملUsing Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation
In this paper, we present a Bayesian Learning based method to train word dependent transition models for HMM based word alignment. We present word alignment results on the Canadian Hansards corpus as compared to the conventional HMM and IBM model 4. We show that this method gives consistent and significant alignment error rate (AER) reduction. We also conducted machine translation (MT) experime...
متن کامل